City pop vs U.S. Pop
In this portfolio, I will explore the differences and features of the two genres City Pop (CP) and U.S.Pop (UP), originating from Japan and the U.S. respectively. CP is a type of genre from Japan that appeared the late 70’s and reached its popularity peak in the 80’s. I will be comparing two playlists consisting of three artists, one Japanese group and one US playlist. The Japanese group consist of the artists Taeko Onuki, Miki Matsubara and Anri. The US counterparts are the artists Michael Jackson, Whitney Houston and Madonna. I chose these corpora because I want to explore whether there are distinct differences between the genre of (city) pop as it was in Japan in the 80’s vs the pop that was popular in the western world in the same decade. Japanese CP was influenced by western music, so I expect there to be many similarities in use of sound, instruments and type of rhythms. However, an aspect I am particularly interested whether there is a difference is the prevalence of bass, and tempo. It is also interesting to see whether there are differences in other aspects like timbre. However, I am unsure to what extent they are different, which will be explored here.
The reason I chose this topic is because of my personal tastes. Ever since I was a child, pop has been a significant part of my life and upbringing as this was perhaps the major genre both my parents listened to. Due to rising popularity on the internet, CP has gained a lot of traction and even has spawned new types of sub-genres, e.g., future funk, from what I think is a similarity from Western pop as well as many of the songs ability to sound modern in today’s standards.
As I have chosen three artists to represent their own (variety) of genres, there might be nuances and representations I am missing. Taeko Onuki, Miki Matsubara and Anri were chosen due to their popularity on Spotify (the amount of general listeners as well as listens to their tracks). I also have to mention that there were personal selections. Nevertheless, they were centered around three albums from each respective artist. The same method was done in choosing the western counterparts. However, the genre(s) is (are) very broad, despite its popularity, and some varieties might have been overlooked. However, their popularity is a strength as many causal listeners will have knowledge of these songs.
Typical, and popular, tracks from the Japanese playlist are:
These songs are typical in the sense that there are prominent use of basslines and clear rhythms, very stereotypical pop and have many timbre features to them, as well as many sound layers, e.g., instruments etc.
The western counterparts have typical tracks like:
These last three tracks especially has the typical and distinct features of pop of the 80’s, namely the sharp drums and the heavily synthesized piano sounds and, what I think, an almost like a “dreamy” sound to them.
Atypical songs from both playlists can include:
In order to explore eventual differences and features of this corpus, I will first start with a classification model with a random forest. Then I will explore track level features between the two genres, focusing on what the classification model labels my corpus. After I will go more into the musical moments such as timbre and chroma, focusing on self-similarity matrices and chromagrams and chordograms. Finally, I will go into depth about what this portfolio has explored and conclude what can be derived from this.
# A tibble: 2 × 3
class precision recall
<fct> <dbl> <dbl>
1 City Pop 0.793 0.742
2 U.S. Pop 0.758 0.806
In order to compute the model, I did capped the playlists at 31 songs in each group. The accuracy of the model as of right now is by the use of the formula (TP+TN)/total according to this website is (24+24)/62 = 0.77.., thus according to various sources like this one it is a relatively good accuracy while at the same time being within a realistic interval.
By this, I can assume that the model is decent at classifying exactly what City Pop and what U.S Pop is. In the next header, I will see what kind of labels were the most important in the classification of these playlists, and determine what labels where most important in this classification.
# A tibble: 2 × 3
class precision recall
<fct> <dbl> <dbl>
1 City Pop 0.828 0.774
2 U.S. Pop 0.788 0.839
Truth
Prediction City Pop U.S. Pop
City Pop 24 5
U.S. Pop 7 26
I hope this is the right way of getting the confusion matrix out of the random forest model. If I use the same formula as in the previous slide, i will get an accuracy of 0.77…, just as the last one.
Regarding the feature selection, it is shown in the graph that the timbre coefficient c11, followed by the track level feature loudness, followed yet again by timbre coefficient c1 are the most important labels that helps the model classify the two genres. The importance of loudness and the timbre coefficient c1 is quite humorous as it was said in one of the lectures that this timbre vector is the rough equivalent to loudness. After those, primarily timbre features are the ones that are of the most importance. This is quite interesting, as I did mention at the very start of the course (without any knowledge of the terms of music) said that City Pop has a lot of layers to them. This was meant as “there is a lot going on” and it sounds different. This is probably why
For the final portfolio, I will use these features in order to improve the existing visualizations I have (and I have the ability to remove some of them). Yay!
Here one can see the temporal and power features between the two genres. As one can see, there is a definite preference for a higher volume in the City Pop group than the other gorup. This graph was made as a direct consequence of the labeling done in the previous section. As one can see, there is a distinct difference in loudness between the two genres, as it seems that while both of them tend to stay around the same beats per minute, i.e., 120-125. Which coincidentally corresponds well to the study by Moelants (2002), which says that humans seem to prefer this tempo.
However, one can tell that there are many items that do not correspond to this as there are plots between the 90-120 area in CP and from 90-150 in UP. One could perhaps say that UP have more items directly corresponding to 120 bpm.
Nevertheless, UP, in terms of loudness, stop at just above -9 dB. What is very clear is that CP does not stop at this as multiple items is above -6 dB. This, if I am interpreting it correctly, does mean that CP is more loud than UP.
Here is a plot of the effect of energy on danceability, with size of the plots as well as the band around the line indicating the tempo of the songs. One can see that there is more of a linear trend with US Pop, indicating that up until around energy of 0.5 there seems to be a correlation between energy and danceability. Tempo, however, do seem to have no pattern at first glance. Whereas for the City Pop playlist, there seem to be a slight curve in the beginning of the graph, but overall there is a very even trend of the effect of energy on danceability. At first glance, there also seem to be no indication that there is a trend for tempo.
However, once can tell that U.S. Pop seem to have more of a positive linear correlation between energy and danceability - more than its eastern counterpart in any case.
In both groups, there are two outliers that especially draw one’s eye - which is “Billie Jean” by Michael Jackson, and “横顔” by Taeko Onuki. They both have, in comparison to other songs, very low energy while at the same time having high danceability.
“Billie Jean” was listed as an outlier for the Pop group in the graph of whether energy has an effect on danceability, in the sense that Spotify notices this as a non energetic song, while having an extremely high danceability. As one can see in this chromagram, the song “Billie Jean” has several areas where the magnitude is over 0.75. A couple of patterns that arise are the use of the D and C#/Db at before 100 and 200 seconds. They form an almost pyramid shape. However, despite this, one can see that it is a very energetic song, by the use of the chromas - despite being a very slow and unenergetic song.
Here is the tempogram of the indentified outlier from the graph of energy’s effect on danceability, where the energy is low, but danceability is high. This is a completely standard tempogram, in terms of the overall execution of the formulations on this piece as there are no disturbances or anything that went wrong. “Billie Jean” is a very stereotypical song in this regard, with a bpm of approximately 120 and very steady beat, rhythm and tempo that do not seem to change. This is probably a part of the reason the overall danceability of this piece is very high, in spite that its energy is very low.
Providing the distribution of tempi in the form of a histogram, there was a couple outliers in both genres. “A HOPE FROM SAD STREET” by Anri is one of them. Here, in comparison to “Billie Jean” there were apparantly some issues when generating the tempo, as there are multiple yellow lines that flicker across the entire piece. However, as one can tell, it was not a complete failure, as it was able to generate not one, but two tempo octaves. It is not as stable as the previous piece, but the lines are clearly there.
This flickering could be attributed to the way this song is realized, in that there are many layers of sound, there are the bassline, but there are also a layer of trumpet as well as a choir and points where the song is stopped and picked up again. The bassline often is replaced by piano, and guitar solos - before returning to the “status quo”. As far as I can tell, these instances are represented in the points in the tempogram.
So far we can see that the differences between US Pop and City Pop, based on the corpus I am using, is that energy definitely seem to have an effect on danceability in US pop, up until a certain level. However, there are many outliers that can skew this, which will be identified shortly. Furthermore, it does not seem that tempo has a definite pattern on either energy or danceability.
Furthermore, another difference is that danceability seem to have more of an effect on the positive valence in both genres. However, one can see that there are more songs in US pop that are more danceable and that are happier than City Pop.
This makes sense, given the fact that according to the histogram, none of the city pop songs go above a certain threshold of valence, in comparison to US pop. This could indicate that city pop is generally less “happy”.
But one outlier I want to talk about in particular is the one track, in the energy, danceability and tempo plot, where the energy is quite low in comparison to other tracks, but the danceability is one of the highest in the group. While I am not sure how to point this out in the plot, I have managed to identify it as the track “Billie Jean” by Michael Jackson. A seperate chromagram has been made in order to account for this.